Accurate identification of polyadenylation sites from 30 end deep sequencing using a naı̈ve Bayes classifier
نویسندگان
چکیده
Motivation: 30 end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 30 ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filters have been applied in these cases, they typically result in a high proportion of both false-positive and -negative classifications. Therefore, there is a need to develop improved algorithms to better identify mis-priming events in oligo-dT primed sequences. Results: By analyzing sequence features flanking 30 ends derived from oligo-dT-based sequencing, we developed a naı̈ve Bayes classifier to classify them as true or false/internally primed. The resulting algorithm is highly accurate, outperforms previous heuristic filters and facilitates identification of novel polyadenylation sites. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online. Received on May 5, 2013; revised on June 25, 2013; accepted on
منابع مشابه
Accurate identification of polyadenylation sites from 3′ end deep sequencing using a naïve Bayes classifier
MOTIVATION 3' end processing is important for transcription termination, mRNA stability and regulation of gene expression. To identify 3' ends, most techniques use an oligo-dT primer to construct deep sequencing libraries. However, this approach can lead to identification of artifactual polyadenylation sites due to internal priming in homopolymeric stretches of adenines. Although heuristic filt...
متن کاملApplication of a Naïve Bayes Classifier to Assign Polyadenylation Sites from 3' End Deep Sequencing Data: A Dissertation
Cleavage and polyadenylation of a precursor mRNA is important for transcription termination, mRNA stability, and regulation of gene expression. This process is directed by a multitude of protein factors and cis elements in the pre-mRNA sequence surrounding the cleavage and polyadenylation site. Importantly, the location of the cleavage and polyadenylation site helps define the 3’ untranslated r...
متن کاملCombining multi-species genomic data for microRNA identification using a Naı̈ve Bayes classifier
Motivation: Most computational methodologies for microRNA gene prediction utilize techniques based on sequence conservation and/or structural similarity. In this study we describe a new technique, which is applicable across several species, for predicting miRNA genes. This technique is based on machine learning, using the Naı̈ve Bayes classifier. It automatically generates a model from the train...
متن کاملLearning Naı̈ve Bayes Classifiers From Attribute Value Taxonomies and Partially Specified Data
Partially specified data are commonplace in many practical applications of machine learning where different instances are described at different levels of precision relative to an attribute value taxonomy (AVT). This paper describes AVTNBL an extension of the Naı̈ve Bayes Learning algorithm that effectively exploits user-supplied attribute value taxonomies to construct compact and accurate Naı̈ve...
متن کاملTowards Biometric Person Identification using fNIRS
We investigate the potential of using fNIRS signals for biometric person identification. Independent sessions for training and testing have been recorded using 8 channels of frontal fNIRS. We extract logarithmic power spectral densities as features to train and test a Naı̈ve Bayes Classifier. We evaluate different frequency bands and report classification results for different trial lengths.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013